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ABSTRACT 



Median finding is an essential problem in statistics, it is provides a more 
robust notion of average than the mean. The measure location median was 
suggested to find the middle among collection multinomial ordered values. 
Bayesian procedure to select the median population is presented .prior 
distribution and linear loss function to find the Bayes risk are used. When the 
number of observation is odd. 
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1- INTRODUCTION 



Over the last twenty years there has been considerable effort expended to 
develop statistically valid ranking-and- selection (R&S) procedures to compare a 
finite number of simulated alternatives. There exist at least four classes of 
comparison problems that arise in simulation studies: selecting the system with 
the largest or smallest expected performance measure (selection of the best), 
comparing all alternatives against a standard (comparison with a standard), and 
selecting the system with the largest probability of actually being the best 
Performer (multinomial selection), and selecting the system with the largest 
probability of success (Bernoulli selection). The ranking and selection approach 
is different. It asks given the data on the distributions of these K populations, 
what is the probability that we can correctly rank them from worst to best? 
What is the probability that we can choose the best population (perhaps the one 
with the largest population mean) or at least the best M out of the K 
populations? [9] [5, 1]. 

Now, in many situations (or problems) we want to select the median value 
(alternative) from among the alternatives .then, How can we get something dose 
to the median, reasonably quickly? Just like the "quicker selection ", given an 
unsorted array, how quickly can one select the median element? Median finding 
is a special case of the more general selection problem which asks for the mth 
element in sorted order. The median provides a better measure of location than 
the mean when there are some extremely large or small observations (i.e., when 
the data are skewed to the right or the left) for this reason, median income is 
used as the measure of location for the U.S. household's income and it is a 
special case of the more general selection problem which asks for the k th 
element in sorted order [6]. 



There are several works in the literature treating the exact median selection 
problem (cf. [BFPRT73], [DZ99], [FJ80], [FR75], [Hoa61], [HPM97]). 
Traditionally, the "comparison cost model" is adopted, where the only factor 
considered in the algorithm cost is the number of key-comparisons. The best 
upper bound on this cost found so far is nearly 3n comparisons in the worst case 
(cf. [DZ99]). The algorithm described here approximates the median with high 
precision and lends itself to an immediate implementation. 
The usefulness of such an algorithm is evident for all applications where it is 
sufficient to find an approximate median for example in some heap sort variants 
or for median- filtering in image representation. In addition, the analysis of its 
precision is of independent interest. All the works mentioned above — as well as 
ours — assume the selection is from values stored in an array in main memory. 
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The algorithm has an additional property which, as we found recently, has led to 
its being discovered before, albeit for solving a different problem [8]. 
Because, more the works they found approximate the median selection therefore 
suggest 's the new approach is to slection the median by bayesian selection 
procedure among multi-populations (categorys) introduce in section 4 . We 
derive the stopping risks (Bayes risk) of making decision d. for linear loss 
function given in section 5. 



2- The Median 



2.1 A brief of the Median 



In probability theory and statistic, a median is described as the numeric value 
separating the higher half of a sample, a population, or a probability 
distribution, from the lower half. The median of a finite list of numbers can be 
found by arranging all the observations from lowest value to highest value and 
picking the middle one. If there is an even number of observations, then there is 
no single middle value, so one often takes the mean of the two middle values. 
In a sample of data, or a finite population, there may be no member of the 
sample whose value is identical to the median (in the case of an even sample 
size) and, if there is such a member, there may be more than one so that the 
median may not uniquely identify a sample member. Nonetheless the value of 
the median is uniquely determined with the usual definition. 

At most half the population has values less than the median and at most half 
have values greater than the median. If both groups contain less than half the 
population, then some of the population is exactly equal to the median. For 
example, if a < b < c, then the median of the list {a, b, c} is b, and if 
a<b < c< d, then the median of the list {a, b, c, d) is the mean of b and c, i.e. 
itis(Z> + c)/2[12]. 

The median can be used as a measure of location when a distribution is 
skewed , when end values are not known, or when one requires reduced 
importance to be attached to outliers , e.g. because they may be measurement 
errors. A disadvantage of the median is the difficulty of handling it theoretically 

The median — calculated by determining the midpoint of rank- ordered cases 
can be used with ordinal, interval, or ratio measurements and no assumptions 
need be made about the shape of the distribution .The median has another 
attractive feature: it is a resistant measure. That means it is not much affected by 
changes in a few cases. Intuitively, this suggests that significant errors of 
observation in several cases will not greatly distort the results. Because it is a 
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resistant measure, outliers have less influence on the median than on the mean. 
For example, notice that the observations 1, 4, 4,5,7,7,8,8,9 have the same 
median (7) as the observations 1,4,4,5,7,7,8,8,542. The means (5.89 and 65.44, 
respectively), however, are quite different because of the outlier, 542 in the 
Second set of observations [10]. 

Hakimi (1964), (1965) was the first to formulate the problem for locating 
a single and multi-medians. He also proposed a simple enumeration procedure 
to solve the problem. The problem is well known to be NP-hard (Garey and 
Johnson 1979). Several heuristics have been developed for pmedian problems. 
Some of them are used to obtain good initial solutions or to calculate 
intermediate solutions on search tree nodes. Teitz and Bart (1968) proposed 
simple interchange heuristics (see also (Maranzana 1964)). More complete 
approaches explore a search tree. They appeared in Efroymson and Ray (1966), 
Jarvinen and Rajala (1972), Neebe (1978), Christofides and Beasley (1982), 
Galvao and Raggi (1989) and Beasley (1993). The combined used of 
Lagrangean relaxation and subgradient optimization in a primal-dual viewpoint 
was found to be a good solution approach to the problem (Christofides and 
Beasley 1982), (Galvao and Raggi 1989), (Beasley 1993)[3]. 

2.2 Theoretical properties [2,12] 

♦ An optimality property 

A median is also a central point which minimizes the average of the absolute 
deviations: In the above example, a median would be 

(1 + + + + 1+ 7) / 6 = 1. 5 using the minimum of the absolute deviations; 
in contrast, the minimize of the sum of squares would be mean, which is 1.944. 
In the language of statistics, a value of c that minimizes 

E(\X - c\) 

Is a median of the probability distribution of the random variable X. However, a 
median c need not be uniquely defined. Where exactly one median exists, 
statisticians speak of "the median" correctly; even when no unique median 
exists, some statisticians speak of "the median" informally. 



♦ An inequality relating means and medians 

For continuous probability distributions, the difference between the median and 
the mean is less than or equal to one standard deviation 

Median has different meanings in different contexts: 
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♦ 



Median in statistics, a number that separates the lowest- value half and 
the highest- value half 

Median ( geometry) , in geometry, a line joining a vertex of a triangle to 
the midpoint of the opposite side 

Geometric Median , a point minimizing the sum of distances to a given 
set of points 

Median road , the portion of a divided highway used to separate 
opposing traffic 

Median filter (image processing) used to reduce noise in images 
an adjective related to the Medes , an Iranian people 
the median language , the language of the Medes 
the median nerve 
Some Uses [4] 



1- Median _filtering is a commonly used technique in signal processing (is 
very useful in Vertical Seismic Profile (VSP) data processing and automatic 
editing of sulfate seismic data. It is also used in seismic processing to 
enhance linear events.) 



2- Find the item which is smaller than half of the items and bigger than half 
the items. 



3- Median crossovers are provided at selected locations on divided highways 
for crossing by maintenance crossing, traffic service, emergency, and law 
enforcement vehicles. 



4- Median data are used to link adjacent roadways and are similar to 
template data in that they give a shape to be constructed. Again, a table of 
typical medians is created. 



5- How median spaces are a natural generalization of CAT (0) cube 
complexes, how one characterizes property (T) and the Haagerup property 
using median spaces. 



6- The median be thought of as geometric middle while the mean is 
arithmetic middle and the geometric nature of the median results in it not 
being influenced by a few large numbers at either extreme. 
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3- The Origin of the Problem 



In many situations we need to select the median term such as the median 
high temperature for the week, the median high among values and variety of 
averages, When the number of observation is odd. Suppose we have the 
multinomial distribution with k cells and unknown probabilities of an 

observation in the ith cell p t , the probability density function given as below: 

m* k 
P(n\p)= , / * P? , 

H\ • » '9 • • • •/ It • 

It k 



£ n t = m And £/>*=! , 5 = (n l9 ...,n k ) 



i= 1 i= 1 



A. 

(i =1, 2,.., m,.., k), where £ P, f = 1 . It is required to find the cell with the 



i=l 



median (bigger than half) probability (best cell in this sense). 



Let P[i] * p [2 ] * •• * Am] - •• - Pw denote the order values of the A * i< k) 9 the 
goal of experimenter is to select the median cell probability that is the cell 
associated with A™] . 
4- The Bayesian Median Procedure 



Before we introduce the Bayesian procedures, we introduce some 
standards definitions and notations which are needed to construct the 

k 

procedures. Let & k :{p= (Pi,p 2 >~>Pk) : £ Pi = 1 \P% - °} be the parameter space 



i=\ 



and D= {d l9 d 19 ... 9 d k } be the decision space where in the following terminal k- 
decision rule: 



d t : Pt Is the median cell probability (j=l, 2, m,.., k). 
That is, d. denote the decision to select the event associated with the i th cell as 
the bigger than half probable event, after the sampling is terminated. 

Suppose the loss function in making decisions^, , defined on^ k x D 5 is 
given as follows. 



L(d i9 p) = 



k*(p [m] ~ Pi) if (p im] * Pi) 
if(^ M = Pi) 



(4.1) 



That is the loss if decision d t is made when the true value of/? = p . Where k % is 
the loss constant, giving losses in terms of cost. 
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The Bayesian approach requires that we specify a prior probability density 
function ^ (p) , expressing our beliefs about P before we obtain the data. From a 

mathematical point of view, it would be convenient if P is assigned a prior 

distribution which is a member of a family of distributions closed under 
multinomial sampling or as a member of the conjugate family. The conjugate 

family in this case is the family of Dirichlet distribution. Accordingly, let P is 

assigned Dirichlet prior distribution with parameters^ \n[ 9 n 2 ,...,ri k . The 

normalized density function is given by[l 1] 



r 



x(p) = 



k 



I 



1=1 



n^r 1 , where m = £ ri. (4.2) 



k *= 1 " ' i= 1 



n r(«;> 

/=1 



and the marginal distribution for P t is Beta density 

[n i - 1) \[m - n t - 1) ! 

Here^L 3 (^4- -X), are regarded as hyper parameters specifying the prior 
distribution. They can be thought of "imaginary counts" from prior experience 
If N x be the number of times that category i is chosen in m independent trials, 
then N = (N l9 ...,N k ) has a multinomial distribution with probability mass 
function 



P r (N x = n l9 N 2 = n 29 ...,N k = n k \p l9 ...,p k )= P(n\p) 

ml k -* 



[J^r, where £ n.= m 9 n= (n 19 . 9 n m9 .. 9 n k ) . 



/ f i • A£ /^ • • • •/ £ / • •_ i 



Since 



n[ - 1 w* - 1 



P(«|/?)« /??.... />? and *(p)« P\~ ■■■■ P 



m + n[ - I n A + nl - 1 



then the posterior is x(p\n)* A r •••• /> 



A- 



This is a member of the Dirichlet family with parameters 

n't = rit + ft, and m'" = m" + m' (j=l, . . ., h). 

Hence, the posterior distribution has density function 

Hp\n)= - , nt /!T - 1 )! - „ pf'\..pi' 1 ...pf' x (4.3) 






with posterior mean /?, = — ^ (/= 1, 2, . . ., A:), «. will be termed the posterior 

frequency in the i th cell. The marginal posterior distribution for p t is the beta 
distribution with probability density function 

/0,K) = r/ „ w ,„ — u:Pi (X- Pt) 

r (/i. )r (m - «. ) 
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5- The Stopping Risks of the Median Procedure 



Consider a multinomial distribution which is characterized by k events 

(cells) with probability vector^ = (Pi>P 2 >->Pk), where P t is the probability of 

k 
the event E i (1< i< k) with £ p t ? = 1. Let n u n 2 ,..,n m ,...,n k be respective 



i=i 



A- 



frequencies in A: cells of the distribution with £ n.= m t Further, let 



i=l 



Ai] - A2] - •••- Pm ^ * Pm denote the ordered values of the A (1^ * * £) . It is 

assumed that the values of A and of the ^(l ^ *>7 ^ *) is completely unknown. 
The goal of the experimenter is to select bigger than half probable event, that, is 
the event associated with P [m] , also called the median cell. 

The stopping risk (the posterior expected loss) of the terminal decision d, when 
the posterior distribution for P has parameters («i^2v- 5 ^ v .. ? n^;m w ) 5 that is when 
the sample path has reached (n^^,..,^,...,^;/^') from the origin 
(n[,n 2 ,..,n m ,..,n k ;m , ) t> denoted by S Xn[,n 1 ,.,n\ n ,..,n k \m), can be found as follows. 



// // 



// /// 



S i (n l ,n 2 ,..,n m ,..,n lc ;m )= E [L{d n p )] 

I (p\n) — 



= k 



n (p\n) 



[m] 



)- 



n 



ff 



m 



m 



The value of ,^/Pm-I is derived as follows 



(5.1) 



* (P\l) 



E\K^= ,P\. m yS(P\ m \)dp [mU 



III - 

m - 1 



i 



So, if the number of observations is odd then 

'k-1 



g(A,„i)= k 



k+ 1/2 



f (Pm) 



k ^V-F{ P[ J k - k ^f( P[m] ) 



= k 



k- 1 



\ 




K 



(m" - 1)! 



(«L- !)!(«'"- nfL,- 1)! 



" 1 

/*[ro] U~ -P[m]) 



m -n [m] -\ 



[m] 



, Ar-l. 



(m'"- 1)! 



ft-j'Kw - «r«i- 7')! 



-•A-]^ 1 " ^[m]) 



m -\- j 



J= n [m] 



[m] 



m - 1 



(m'"- 1)! 



, A-- 1. 



1 "L-!(/-l- i )!^ (1 " / ' w) 



m -\- j 



J= n [m] 



..(5.2) 



is the probability density function of the median order statistics P\m\ .where the 
marginal posterior probability density function of P t if P t = P\m\ is 



/(A«l) = 



fm'" - 1)! 



(%,i- l)!(w"- «;' ml - 1)! 



P[m] U P[ m ]) 



HI II 

m -n [m 



-1 



(5.3) 



[m] 



and the cumulative density function is 



8 



7 



F( „ x. Y 1 W ~ ^ „ V1 „ * 



m -\- j 



Such that the ordered values of n[,n 2 ,.,n m ,..,n k ferL < n" < ,..,< n'< n 



ft 

m 



Then, 



E Cp^)~- 



k[(m'"- 1)!] 



% (pn) 



Km]" 1)!(W W " Km]~ !) ! 



r*-i ^ 




1 


it+ 1/ 




* 


/2, 




■ 


\ / 


. 



ff 



/// // 



P[m] U A«]' 



(m -« [w] -l) 



m - 1 to - 1 m-1 



(m'" - 1)! 



(m" - 1)! 



c 



\ 



v y .... V 



/// // 



A™] 



1- 



P[m] 



A+72+-+A 



W - 1 



fw'" - 1) ! 






/// // 



/^[to] U P[m]) 



m . 

m - 1- j 



j =«[ 



A- 

I 

7=0 



V j 



\\ J ((m 



ttt 



-!)!(!- y 



J 



m - 1 to - 1 



(m'" - 1)! 



(m" - 1) ! 



/ 



...» 7 -_» /,'(w - w FfMl - /,)! /r!(i?i - n*. - /,)! 



\ 



P[m] 



V 



1- 



P[m] 



71 + --+7./ 



/ 



dp [ 



m] 



m - 1 



i 



- E 



<W"- 1)! 



7 ="[m] Jk 



=«;', A'O* - w r »i- A)' 



^[w] ^ A™]' 



m -l-i 



(m" - 1)! 



A ! ( m '"-ii -A)! 



m - 1 m - I 

i -i 



(m - 1)! 



(m - 1)! 



• • • 4 • 



M // 



J\ zn [m] Jk 



(»■<., 7i'(« -« w -ii) ! AK» • «[»] ■ A) ! 



m - 1 



i 



J--Q 



k 
J 



I 



/M " 1 ffj - 1 



(m - 1)! 



(«"-!)! 



Ill If 



• • 



/// // 



Vr "[/«] 7y 



',=»,„, ii ! (m -« w -i'i)! /;!(» '«[»,]- 7/)! 



m -1 



i 



(m" - 1)! 



i '■ "i-i 



Mm-n' m -j)\ 



m-\ 



(m - 1)! 



; = V, 7'Km -%\- )f 



i 



/*[«!] U /'[m] 7 



in . w 



HI - 1- W 



[»] 








l^[m] U -P[m]) 



/;/ 



-l-i 71+72+-+A 



P{m]( l - P[m]) 



w -l-i 



/ \ 1 

y~ P[m], P[m] U" Aw]) 



m -l-i 



7i+72+— +/ 



/ 



7^[m] U Aw]/ 



m -l-i 



J/7 



[m] 



Hence S i (n l9 n 29 ... 9 n k ;m) 
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im - 1)! 



Ill // 



iA m - %] ■ ;*)' 



m - 1 m I 

i -i 

i' 

Jr n [m] Jk 



(m - 1)! 



(«" - 1)! 



Ill II 



• ■ « ■ • 



'" // 



h„ im - %] - kr- iM -%\- hr- 



m -1 



i 



J--0 



\ 

k 



J 



I 



III . Ill 4 

m - 1 /// - 1 

i)V- 1)!'([ ••[ 



(m" - 1)! 



(m" - 1)! 



/// // 



• • 



III // 



J\ z n [m] Jj z n [m] 



t »;., 7iK» • «r»r 7i) ! /;!(»' ■ H,i- /;)! 



/M - l 



(«" - l)! 



I ~ M v 



" 1 

m - 1 



>- 



(m - l)! 



>.<.J(» -\- jy- 



} : »[„i 



....(5.4) 



KdO 



(m" - j - 1)+ (ji " 7 2 " •••- 7* " 1)+ J A m '» - y- 1) + C/i " Ji ~ •••- // " 1) 




(m" - j- X)\ (j\ - j 2 - ...- j k - 1)+ J / m >»_ j_ !)+ C/i " 7*2 " •••- Jj ~ 1) 



ft 



// 



/77 



+ 1 + 




6 - Future Work 



Our plan in future is to produce some numerical results for this 
procedure and simulation . 

Fully and group Bayesian sequential scheme to selecting the median 
multinomial selection problem can be developed . 

An upper bound for risks may be found using functional analysis. 

General loss functions may be used, where linear loss is considered as a 
special case. 

To simplify the formula (5.4 ) we can use sterling's approximation for 
large factorials and hence we will get an approximate formula to (5.4 ) . 
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